@nilseling
@NilsEling

1 Data and code availability

To follow this tutorial, please visit https://github.com/BodenmillerGroup/demos/tree/main/docs. The compiled .html file of this workshop is hosted at: https://bodenmillergroup.github.io/demos.

We will need to install the following packages for the workshop:

if (!requireNamespace("BiocManager", quietly = TRUE))
    install.packages("BiocManager")

BiocManager::install(c("imcRtools", "cytomapper", "tidyverse", 
                       "ggplot2", "viridis", "pheatmap", "scales")))

To reproduce the analysis, clone the repository:

git clone https://github.com/BodenmillerGroup/demos.git

and open the Bioc2022_workshop.Rmd file in the docs folder.

2 Introduction

Highly multiplexed imaging enables the simultaneous detection of tens of biological molecules (e.g. proteins, RNA; also referred to as “markers”) in their spatial tissue context. Recently established multiplexed imaging technologies rely on cyclic staining with immunofluorescently-tagged antibodies (Lin et al. 2018; Gut, Herrmann, and Pelkmans 2018), or the use of oligonucleotide-tagged Goltsev et al. (2018) or metal-tagged antibodies Angelo et al. (2014), among others. Across technologies, the acquired data are commonly stored as multi-channel images, where each pixel encodes the abundance of all acquired markers at a specific position in the tissue. After data acquisition, bioimage processing and segmentation are conducted to extract data for downstream analysis. When performing end-to-end multiplexed image analysis, the user is often faced with a diverse set of computational tools and complex analysis scripts.

Here, we present an interoperabale, modularized computational work ow to process and analyze multiplexed imaging data (Figure 1). The steinbock framework facilitates multi-channel image processing including raw data pre-processing, image segmentation and feature extraction. Data generated by steinbock can be directly read by the imcRtools R/Bioconductor package for data visualization and spatial analysis (Figure 1). The cytomapper package support image handling and composite as well as segmentation mask visualization.

The presented workflow is customizable, reproducible, user-friendly and integrates with a variety of downstream analysis strategies by employing standardized data formats. The tools comprised in this work ow support processing and analysis of data generated by a range of multiplexed imaging technologies. However, for demonstration purposes, we present data from Imaging Mass Cytometry (IMC), which relies on tissue staining with metal-labelled antibodies to jointly measure the spatial distribution of up to 40 proteins or RNA at 1μm resolution Schulz et al. (2018).

Figure 1: Overview of the multiplexed image processing and analysis workflow. Raw image data can be interactively visualized using napari plugins such as napari-imc for IMC, to assess data quality and for exploratory visualization. The steinbock framework performs image pre-processing, cell segmentation and single-cell data extraction using established approaches and standardized file formats. Data can be imported into R using the imcRtools package, which further supports spatial visualization and analysis. Storing the data in a SingleCellExperiment or SpatialExperiment object, imcRtools integrates with a variety of data analysis tools of the Bioconductor project such as cytomapper (Eling et al. 2020). Alternatively, steinbock exports data to the anndata format for analysis in Python, e.g. using squidpy.

The workshop in broadly structured in 4 parts:

  1. Image processing using the steinbock framework
  2. Data import into R
  3. Image visualization using cytomapper
  4. Spatial analysis using imcRtools

The analysis approaches presented here were taken from the IMC data analysis book. The book provides more detailed information on the technical underpinnings of the analysis.

More information can also be found in our preprint

We use imaging mass cytometry data to highlight the functionality of the cytomapper package. However, any imaging technology is supported as long as the data can be read into R (memory restrictions and file type restrictions.)

3 Image processing

In the first part of this workshop, we will present a new framework for multiplexed image processing.

3.1 Dowload IMC raw data

To highlight the basic steps of multiplexed image analysis, we provide example data that were acquired as part of the Integrated iMMUnoprofiling of large adaptive CANcer patient cohorts projects (immucan.eu). The raw data of 4 patients can be accessed online at zenodo.org/record/5949116.

At this point, we can download the raw data to test the steinbock framework in the next section.

In the interest of time, we will not download the data now and only conceptually discuss the steinbock framework.

dir.create("../data/steinbock/raw")
## Warning in dir.create("../data/steinbock/raw"): '../data/steinbock/raw' already
## exists
download.file("https://zenodo.org/record/5949116/files/panel.csv",
              "../data/steinbock/raw/panel.csv")
download.file("https://zenodo.org/record/5949116/files/Patient1.zip",
              "../data/steinbock/raw/Patient1.zip")
download.file("https://zenodo.org/record/5949116/files/Patient2.zip",
              "../data/steinbock/raw/Patient2.zip")
download.file("https://zenodo.org/record/5949116/files/Patient3.zip",
              "../data/steinbock/raw/Patient3.zip")
download.file("https://zenodo.org/record/5949116/files/Patient4.zip",
              "../data/steinbock/raw/Patient4.zip")

3.2 The steinbock framework

The steinbock framework offers tools for multi-channel image processing using the command-line or Python code (Windhager, Bodenmiller, and Eling 2021). Supported tasks include IMC data preprocessing, supervised multi-channel image segmentation, object quantification and data export to a variety of file formats. It further allows deep-learning enabled image segmentation. The framework is available as platform-independent Docker container, ensuring reproducibility and user-friendly installation. Read more in the Docs.

In the section above, we can download example data and create a folder structure that the steinbock framework needs.
The basic input to steinbock looks as follows:

steinbock data/working directory
|
├── raw                       (user-provided, when starting from raw data)
  ├── panel.csv
  ├── Patient1.zip
  ├── Patient2.zip
  ├── Patient3.zip
  ├── Patient4.zip

The panel.csv contains information on the antibodies/channels used in the experiment. In the case of IMC data, this file needs to contain an entry to Metal Tag, Target (name of the antibody target), keep (which channels to analyse), deepcell (which channels to aggregate for deepcell segmentation).

Each .zip file contains IMC raw data of one slide. Multiple acquisitions/images are present in each file.

The following code chunk displays a bash script to run the steinbock framework from IMC raw data via image segmentation to single-cell data export.

#!/usr/bin/env bash
alias steinbock="docker run -v path/to/demos/data/steinbock:/data -u $(id -u):$(id -g) ghcr.io/bodenmillergroup/steinbock:0.14.2"

# panel pre-processing
steinbock preprocess imc panel --namecol Clean_Target

# file type conversion and filtering
steinbock preprocess imc images --hpf 50

# deep learning-based segmentation
steinbock segment deepcell --minmax

# measurement
steinbock measure intensities
steinbock measure regionprops
steinbock measure neighbors --type expansion --dmax 4

The steinbock preprocess imc panel call reads in an unformatted panel and creates a standardized panel format.

The steinbock preprocess imc images call reads in the IMC raw data, performs a “hot pixel filtering” and writes out one .tiff file per acquisition.

The steinbock segment deepcell --minmax uses a pre-trained neural network to perform single-cell segmentation of the images.

Finally, the steinbock measure intensities, steinbock measure regionprops and steinbock measure neighbors --type expansion --dmax 4 calls extract the mean pixel intensities per cell an channel, the morphological features of the cells and spatial cell neighbour graphs.

The final folder structure looks as follows:

steinbock data/working directory
|
├── raw                       (user-provided, when starting from raw data)
|
├── img                       (user-provided, when not starting from raw data)
├── panel.csv                 (user-provided, when not starting from raw data)
├── images.csv
|
├── masks
|
├── intensities
├── regionprops
└── neighbors

For easy access, we can now download the already pre-processed data:

# download intensities
url <- "https://zenodo.org/record/6642699/files/intensities.zip"
destfile <- "../data/steinbock/intensities.zip"
download.file(url, destfile)
unzip(destfile, exdir="../data/steinbock", overwrite=TRUE)
unlink(destfile)

# download regionprops
url <- "https://zenodo.org/record/6642699/files/regionprops.zip"
destfile <- "../data/steinbock/regionprops.zip"
download.file(url, destfile)
unzip(destfile, exdir="../data/steinbock", overwrite=TRUE)
unlink(destfile)


# download neighbors
url <- "https://zenodo.org/record/6642699/files/neighbors.zip"
destfile <- "../data/steinbock/neighbors.zip"
download.file(url, destfile)
unzip(destfile, exdir="../data/steinbock", overwrite=TRUE)
unlink(destfile)

# download images
url <- "https://zenodo.org/record/6642699/files/img.zip"
destfile <- "../data/steinbock/img.zip"
download.file(url, destfile)
unzip(destfile, exdir="../data/steinbock", overwrite=TRUE)
unlink(destfile)

# download masks
url <- "https://zenodo.org/record/6642699/files/masks_deepcell.zip"
destfile <- "../data/steinbock/masks_deepcell.zip"
download.file(url, destfile)
unzip(destfile, exdir="../data/steinbock", overwrite=TRUE)
unlink(destfile)

# download individual files
download.file("https://zenodo.org/record/6642699/files/panel.csv", 
              "../data/steinbock/panel.csv")
download.file("https://zenodo.org/record/6642699/files/images.csv", 
              "../data/steinbock/images.csv")

4 Data import into R

After image processing, we have developed R/Bioconductor packages that read in the processed single-cell data, the images and the segmnetation masks.

4.1 Reading in single-cell data

The imcRtools package supports the handling and analysis of imaging mass cytometry and other highly multiplexed imaging data. The main functionality includes reading in single-cell data after image segmentation and measurement, data formatting to perform channel spillover correction and a number of spatial analysis approaches.

In the first instance, we can read in the single-cell data as processed using steinbock by calling the read_steinbock function:

library(imcRtools)

spe <- read_steinbock("../data/steinbock/")
spe
## class: SpatialExperiment 
## dim: 40 46917 
## metadata(0):
## assays(1): counts
## rownames(40): MPO HistoneH3 ... DNA1 DNA2
## rowData names(11): channel name ... Final.Concentration...Dilution
##   uL.to.add
## colnames: NULL
## colData names(8): sample_id ObjectNumber ... width_px height_px
## reducedDimNames(0):
## mainExpName: NULL
## altExpNames(0):
## spatialCoords names(2) : Pos_X Pos_Y
## imgData names(1): sample_id

By default, single-cell data is read in as SpatialExperiment object. The summarized pixel intensities per channel and cell (here mean intensity) are stored in the counts slot. Columns represent cells and rows represent channels.

counts(spe)[1:5,1:5]
##                [,1]       [,2]      [,3]      [,4]      [,5]
## MPO       0.6273888  0.4500000 0.5286462  1.019142 0.4000000
## HistoneH3 3.4116090 13.0472305 2.5331530  9.596759 2.8927974
## SMA       0.2837388  2.0064459 0.1631139  1.455513 0.2264860
## CD16      2.1288451  2.7879388 2.1463270 18.429319 0.8185134
## CD38      0.2760149  0.7482431 1.1644259  2.324016 0.5419144

Metadata associated to individual cells are stored in the colData slot. After initial image processing, these metadata include the numeric identifier (ObjectNumber), the area, and morphological features of each cell. In addition, sample_id stores the image name from which each cell was extracted and the width and height of the corresponding images are stored.

head(colData(spe))
## DataFrame with 6 rows and 8 columns
##      sample_id ObjectNumber      area major_axis_length minor_axis_length
##    <character>    <numeric> <numeric>         <numeric>         <numeric>
## 1 Patient1_001            1        11           7.00009           1.90442
## 2 Patient1_001            2        20          14.05565           1.95929
## 3 Patient1_001            3        16           9.16515           2.00000
## 4 Patient1_001            4        19           7.68906           3.08022
## 5 Patient1_001            5        10           6.00000           1.95959
## 6 Patient1_001            6        20           8.06811           3.14732
##   eccentricity  width_px height_px
##      <numeric> <numeric> <numeric>
## 1     0.962281       600       600
## 2     0.990237       600       600
## 3     0.975900       600       600
## 4     0.916254       600       600
## 5     0.945163       600       600
## 6     0.920775       600       600

The read_steinbock function can also read in the single-cell data as SingleCellExperiment object.

The main difference between the SpatialExperiment and the SingleCellExperiment data container in the current setting is the way spatial locations of all cells are stored. For the SingleCellExperiment container, the locations are stored in the colData slot while the SpatialExperiment container stores them in the spatialCoords slot:

head(spatialCoords(spe))
##      Pos_X     Pos_Y
## 1 468.8182 0.3636364
## 2 516.4500 0.4000000
## 3 587.5000 0.5000000
## 4 192.2632 0.8947368
## 5 231.5000 0.4000000
## 6 270.8000 0.8500000

The spatial object graphs generated by steinbock are read into a colPair slot of the SpatialExperiment (or SingleCellExperiment) object. Cell-cell interactions (cells in close spatial proximity) are represented as “edge list” (stored as SelfHits object). Here, the left side represents the column indices of the “from” cells and the right side represents the column indices of the “to” cells. In the last part of this workflow, we will highlight the visualization of the spatial object graphs.

colPair(spe, "neighborhood")
## SelfHits object with 247098 hits and 0 metadata columns:
##                 from        to
##            <integer> <integer>
##        [1]         1        27
##        [2]         1        54
##        [3]         2        10
##        [4]         2        43
##        [5]         3        16
##        ...       ...       ...
##   [247094]     46916     46894
##   [247095]     46917     46854
##   [247096]     46917     46879
##   [247097]     46917     46888
##   [247098]     46917     46912
##   -------
##   nnode: 46917

Finally, metadata regarding the channels are stored in the rowData slot. This information is extracted from the panel.csv file. Channels are ordered by isotope mass and therefore match the channel order of the multi-channel images.

head(rowData(spe))
## DataFrame with 6 rows and 11 columns
##               channel        name      keep   ilastik  deepcell Tube.Number
##           <character> <character> <numeric> <numeric> <numeric>   <numeric>
## MPO               Y89         MPO         1        NA        NA        2101
## HistoneH3       In113   HistoneH3         1         1         1        2113
## SMA             In115         SMA         1        NA        NA        1914
## CD16            Pr141        CD16         1        NA        NA        2079
## CD38            Nd142        CD38         1        NA        NA        2095
## HLADR           Nd143       HLADR         1        NA        NA        2087
##                        Target Antibody.Clone Stock.Concentration
##                   <character>    <character>           <numeric>
## MPO       Myeloperoxidase MPO Polyclonal MPO                 500
## HistoneH3          Histone H3           D1H2                 500
## SMA                       SMA            1A4                 500
## CD16                     CD16       EPR16784                 500
## CD38                     CD38        EPR4106                 500
## HLADR                  HLA-DR        TAL 1B5                 500
##           Final.Concentration...Dilution   uL.to.add
##                              <character> <character>
## MPO                              4 ug/mL         0.8
## HistoneH3                        1 ug/mL         0.2
## SMA                           0.25 ug/mL        0.05
## CD16                             5 ug/mL           1
## CD38                           2.5 ug/mL         0.5
## HLADR                            1 ug/mL         0.2

We already provide a SpatialExperiment object that contain the cell phenotype information as detected following the IMC data analysis book.

download.file("https://zenodo.org/record/6810879/files/spe.rds",
              "../data/spe.rds")

(spe <- readRDS("../data/spe.rds"))
## class: SpatialExperiment 
## dim: 40 46825 
## metadata(4): color_vectors cluster_codes SOM_codes delta_area
## assays(2): counts exprs
## rownames(40): MPO HistoneH3 ... DNA1 DNA2
## rowData names(15): channel name ... marker_class used_for_clustering
## colnames(46825): Patient1_001_1 Patient1_001_2 ... Patient4_008_2772
##   Patient4_008_2773
## colData names(20): sample_id ObjectNumber ... cell_labels celltype
## reducedDimNames(8): UMAP TSNE ... seurat UMAP_seurat
## mainExpName: NULL
## altExpNames(0):
## spatialCoords names(2) : Pos_X Pos_Y
## imgData names(1): sample_id

4.2 Reading in images

We developed the cytomapper package for handling and visualization of multichannel images and segmentation masks. The main functions of this package allow 1. reading in multichannel images and segmentation mask, 2. the visualisation of pixel-level information across multiple channels, 3. the display of cell-level information (expression and/or metadata) on segmentation masks and 4. gating and visualisation of single cells.

The loadImages function is used to read in processed multi-channel images and their corresponding segmentation masks. Of note, the multi-channel images generated by steinbock are saved as 32-bit images while the segmentation masks are saved as 16-bit images. To correctly scale pixel values of the segmentation masks when reading them in set as.is = TRUE.

library(cytomapper)
## Loading required package: EBImage
## 
## Attaching package: 'EBImage'
## The following object is masked from 'package:SummarizedExperiment':
## 
##     resize
## The following object is masked from 'package:Biobase':
## 
##     channel
## The following objects are masked from 'package:GenomicRanges':
## 
##     resize, tile
## The following objects are masked from 'package:IRanges':
## 
##     resize, tile
## 
## Attaching package: 'cytomapper'
## The following objects are masked from 'package:Biobase':
## 
##     channelNames, channelNames<-
images <- loadImages("../data/steinbock/img/")
## All files in the provided location will be read in.
masks <- loadImages("../data/steinbock/masks_deepcell/", as.is = TRUE)
## All files in the provided location will be read in.

In the case of multi-channel images, it is beneficial to set the channelNames for easy visualization. Using the steinbock framework, the channel order of the single-cell data matches the channel order of the multi-channel images. However, it is recommended to make sure that the channel order is identical between the single-cell data and the images.

channelNames(images) <- rownames(spe)
images
## CytoImageList containing 14 image(s)
## names(14): Patient1_001 Patient1_002 Patient1_003 Patient2_001 Patient2_002 Patient2_003 Patient2_004 Patient3_001 Patient3_002 Patient3_003 Patient4_005 Patient4_006 Patient4_007 Patient4_008 
## Each image contains 40 channel(s)
## channelNames(40): MPO HistoneH3 SMA CD16 CD38 HLADR CD27 CD15 CD45RA CD163 B2M CD20 CD68 Ido1 CD3 LAG3 / LAG33 CD11c PD1 PDGFRb CD7 GrzB PDL1 TCF7 CD45RO FOXP3 ICOS CD8a CarbonicAnhydrase CD33 Ki67 VISTA CD40 CD4 CD14 Ecad CD303 CD206 cleavedPARP DNA1 DNA2

For further visualization we will need to add additional metadata to the elementMetadata slot of the CytoImageList objects. This slot is easily accessible using the mcols function.

Here, we will save the matched sample_id, patient_id and indication information within the elementMetadata slot of the multi-channel images and segmentation masks objects. It is crucial that the order of the images in both CytoImageList objects is the same.

library(tidyverse)
## ── Attaching packages ─────────────────────────────────────── tidyverse 1.3.2 ──
## ✔ ggplot2 3.3.6     ✔ purrr   0.3.4
## ✔ tibble  3.1.7     ✔ dplyr   1.0.9
## ✔ tidyr   1.2.0     ✔ stringr 1.4.0
## ✔ readr   2.1.2     ✔ forcats 0.5.1
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::collapse()   masks IRanges::collapse()
## ✖ dplyr::combine()    masks EBImage::combine(), Biobase::combine(), BiocGenerics::combine()
## ✖ dplyr::count()      masks matrixStats::count()
## ✖ dplyr::desc()       masks IRanges::desc()
## ✖ tidyr::expand()     masks S4Vectors::expand()
## ✖ dplyr::filter()     masks stats::filter()
## ✖ dplyr::first()      masks S4Vectors::first()
## ✖ dplyr::lag()        masks stats::lag()
## ✖ ggplot2::Position() masks BiocGenerics::Position(), base::Position()
## ✖ purrr::reduce()     masks GenomicRanges::reduce(), IRanges::reduce()
## ✖ dplyr::rename()     masks S4Vectors::rename()
## ✖ dplyr::slice()      masks IRanges::slice()
## ✖ purrr::transpose()  masks EBImage::transpose()
all.equal(names(images), names(masks))
## [1] TRUE
mcols(images) <- mcols(masks) <- DataFrame(sample_id = names(images))

4.3 Generate single-cell data from images

An alternative way of generating a SingleCellExperiment object directly from the multi-channel images and segmentation masks is supported by the measureObjects function of the cytomapper package. For each cell present in the masks object, the function computes the mean pixel intensity per channel as well as morphological features (area, radius, major axis length, eccentricity) and the location of cells:

cytomapper_sce <- measureObjects(masks, image = images, img_id = "sample_id")

cytomapper_sce
## class: SingleCellExperiment 
## dim: 40 46917 
## metadata(0):
## assays(1): counts
## rownames(40): MPO HistoneH3 ... DNA1 DNA2
## rowData names(0):
## colnames: NULL
## colData names(8): sample_id object_id ... m.majoraxis m.eccentricity
## reducedDimNames(0):
## mainExpName: NULL
## altExpNames(0):

5 Image visualization

In the next section, we will highlight the use of the cytomapper package to visualize multichannel images and segmentation masks.

For convenience, we will select three example images and their corresponding segmentation masks.

# Sample images
set.seed(220517)
cur_id <- sample(unique(spe$sample_id), 3)
cur_images <- images[names(images) %in% cur_id]
cur_masks <- masks[names(masks) %in% cur_id]

5.1 Pixel visualization

The following section gives examples for visualizing multiple channels as pseudo-color composite images. For this the cytomapper package exports the plotPixels function which expects a CytoImageList object storing one or multiple multi-channel images.

The following example highlights the visualization of 6 markers (maximum allowed number of markers) at once per image. The markers indicate the spatial distribution of tumor cells (E-caherin), T cells (CD3), B cells (CD20), CD8+ T cells (CD8a), plasma cells (CD38) and proliferating cells (Ki67).

plotPixels(cur_images, 
           colour_by = c("Ecad", "CD3", "CD20", "CD8a", "CD38", "Ki67"),
           bcg = list(Ecad = c(0, 5, 1),
                      CD3 = c(0, 5, 1),
                      CD20 = c(0, 5, 1),
                      CD8a = c(0, 5, 1),
                      CD38 = c(0, 8, 1),
                      Ki67 = c(0, 5, 1)))

5.2 Cell visualization

In the following section, we will show examples on how to visualize single cells as segmentation masks. This type of visualization allows to observe the spatial distribution of cell phenotypes, the visual assessment of morphological features and quality control in terms of cell segmentation and phenotyping.

The cytomapper package provides the plotCells function that accepts a CytoImageList object containing segmentation masks. These are defined as single channel images where sets of pixels with the same integer ID identify individual cells. This integer ID can be found as an entry in the colData(spe) slot and as pixel information in the segmentation masks. The entry in colData(spe) needs to be specified via the cell_id argument to the plotCells function. In that way, data contained in the SpatialExperiment object can be mapped to segmentation masks. For the current dataset, the cell IDs are stored in colData(spe)$ObjectNumber.

As cell IDs are only unique within a single image, plotCells also requires the img_id argument. This argument specifies the colData(spe) as well as the mcols(masks) entry that stores the unique image name from which each cell was extracted. In the current dataset the unique image names are stored in colData(spe)$sample_id and mcols(masks)$sample_id.

Providing these two entries that allow mapping between the SpatialExperiment object and segmentation masks, we can now color individual cells based on their cell type:

plotCells(cur_masks,
          object = spe, 
          cell_id = "ObjectNumber", img_id = "sample_id",
          colour_by = "celltype")

For consistent visualization, the plotCells function takes a named list as color argument. The entry name must match the colour_by argument.

plotCells(cur_masks,
          object = spe, 
          cell_id = "ObjectNumber", img_id = "sample_id",
          colour_by = "celltype",
          colour = list(celltype = metadata(spe)$color_vectors$celltype))

If only individual cell types should be visualized, the SpatialExperiment object can be subsetted (e.g., to only contain CD8+ T cells). In the following example CD8+ T cells are colored in red and all other cells that are not contained in the dataset are colored in white (as set by the missing_color argument).

CD8 <- spe[,spe$celltype == "CD8"]

plotCells(cur_masks,
          object = CD8, 
          cell_id = "ObjectNumber", img_id = "sample_id",
          colour_by = "celltype",
          colour = list(celltype = c(CD8 = "red")),
          missing_colour = "white")

6 Spatial analysis

The imcRtools package contains a number of spatial analysis approaches. First, cell-cell interactions are detected via spatial graph construction; these graphs can be visualized with cells representing nodes and interactions representing edges. Furthermore, per cell, its direct neighbours are summarized to allow spatial clustering. Per image/grouping level, interactions between types of cells are counted, averaged and compared against random permutations. In that way, types of cells that interact more (attraction) or less (avoidance) frequently than expected by chance are detected.

6.1 Spatial interaction graphs

Many spatial analysis approaches either compare the observed versus expected number of cells around a given cell type (point process) or utilize interaction graphs (spatial object graphs) to estimate clustering or interaction frequencies between cell types.

The steinbock framework allows the construction of these spatial graphs. During image processing, we have constructed a spatial graph by expanding the individual cell masks by 4 pixels.

The imcRtools package further allows the ad hoc consctruction of spatial graphs directly using a SpatialExperiment or SingleCellExperiment object while considering the spatial location (centroids) of individual cells. The buildSpatialGraph function allows constructing spatial graphs by detecting the k-nearest neighbors in 2D (knn), by detecting all cells within a given distance to the center cell (expansion) and by Delaunay triangulation (delaunay).

When constructing a knn graph, the number of neighbors (k) needs to be set and (optionally) the maximum distance to consider (max_dist) can be specified. When constructing a graph via expansion, the distance to expand (threshold) needs to be provided. For graphs constructed via Delaunay triangulation, the max_dist parameter can be set to avoid unusually large connections at the edge of the image.

spe <- buildSpatialGraph(spe, img_id = "sample_id", type = "knn", k = 20)
spe <- buildSpatialGraph(spe, img_id = "sample_id", type = "expansion", threshold = 20)
spe <- buildSpatialGraph(spe, img_id = "sample_id", type = "delaunay", max_dist = 50)

The spatial graphs are stored in colPair(spe, name) slots. These slots store SelfHits objects representing edge lists in which the first column indicates the index of the “from” cell and the second column the index of the “to” cell. Each edge list is newly constructed when subsetting the object.

colPairNames(spe)
## [1] "neighborhood"                "knn_interaction_graph"      
## [3] "expansion_interaction_graph" "delaunay_interaction_graph"

Here, colPair(spe, "neighborhood") stores the spatial graph constructed by steinbock, colPair(spe, "knn_interaction_graph") stores the knn spatial graph, colPair(spe, "expansion_interaction_graph") stores the expansion graph and colPair(spe, "delaunay_interaction_graph") stores the graph constructed by Delaunay triangulation.

6.2 Spatial visualization

The previous section highlights the use of the cytomapper package to visualize multichannel images and segmentation masks. Here, we introduce the plotSpatial function of the imcRtools package to visualize the cells’ centroids and cell-cell interactions as spatial graphs.

In the following example, we select one image for visualization purposes. Here, each dot (node) represents a cell and edges are drawn between cells in close physical proximity as detected by steinbock or the buildSpatialGraph function. Nodes are variably colored based on the cell type and edges are colored in grey.

library(ggplot2)
library(viridis)

# steinbock interaction graph 
plotSpatial(spe[,spe$sample_id == "Patient3_001"], 
            node_color_by = "celltype", 
            img_id = "sample_id", 
            draw_edges = TRUE, 
            colPairName = "neighborhood", 
            nodes_first = FALSE, 
            edge_color_fix = "grey") + 
    scale_color_manual(values = metadata(spe)$color_vectors$celltype) +
    ggtitle("steinbock interaction graph")

# knn interaction graph 
plotSpatial(spe[,spe$sample_id == "Patient3_001"], 
            node_color_by = "celltype", 
            img_id = "sample_id", 
            draw_edges = TRUE, 
            colPairName = "knn_interaction_graph", 
            nodes_first = FALSE,
            edge_color_fix = "grey") + 
    scale_color_manual(values = metadata(spe)$color_vectors$celltype) +
    ggtitle("knn interaction graph")

# expansion interaction graph 
plotSpatial(spe[,spe$sample_id == "Patient3_001"], 
            node_color_by = "celltype", 
            img_id = "sample_id", 
            draw_edges = TRUE, 
            colPairName = "expansion_interaction_graph", 
            nodes_first = FALSE, 
            directed = FALSE,
            edge_color_fix = "grey") + 
    scale_color_manual(values = metadata(spe)$color_vectors$celltype) +
    ggtitle("expansion interaction graph")

# delaunay interaction graph 
plotSpatial(spe[,spe$sample_id == "Patient3_001"], 
            node_color_by = "celltype", 
            img_id = "sample_id", 
            draw_edges = TRUE, 
            colPairName = "delaunay_interaction_graph", 
            nodes_first = FALSE,
            edge_color_fix = "grey") + 
    scale_color_manual(values = metadata(spe)$color_vectors$celltype) +
    ggtitle("delaunay interaction graph")

Finally, the plotSpatial function allows displaying all images at once. This visualization can be useful to quickly detect larger structures of interest.

plotSpatial(spe, 
            node_color_by = "celltype", 
            img_id = "sample_id", 
            node_size_fix = 0.5) + 
    scale_color_manual(values = metadata(spe)$color_vectors$celltype)

6.3 Cellular neighborhood analysis

The following section highlights the use of the imcRtools package to detect cellular neighborhoods. This approach has been proposed by (Goltsev et al. 2018) and (Schürch et al. 2020) to group cells based on information contained in their direct neighborhood.

(Goltsev et al. 2018) perfomed Delaunay triangulation-based graph construction, neighborhood aggregation and then clustered cells. (Schürch et al. 2020) on the other hand constructed a 10-nearest neighbor graph before aggregating information across neighboring cells.

In the following code chunk we will use the 20-nearest neighbor graph as constructed above to define the direct cellular neighborhood. The aggregateNeighbors function allows neighborhood aggregation in 2 different ways:

  1. For each cell the function computes the fraction of cells of a certain type (e.g., cell type) among its neighbors.
  2. For each cell it aggregates (e.g., mean) the expression counts across all neighboring cells.

Based on these measures, cells can now be clustered into cellular neighborhoods. We will first compute the fraction of the different cell types among the 20-nearest neighbors and use kmeans clustering to group cells into 10 cellular neighborhoods.

# By celltypes
spe <- aggregateNeighbors(spe, colPairName = "knn_interaction_graph", 
                          aggregate_by = "metadata", count_by = "celltype")

set.seed(220705)

cn_1 <- kmeans(spe$aggregatedNeighbors, centers = 10)
spe$cn_celltypes <- as.factor(cn_1$cluster)

plotSpatial(spe, 
            node_color_by = "cn_celltypes", 
            img_id = "sample_id", 
            node_size_fix = 0.5) +
    scale_color_brewer(palette = "Set3")

The next code chunk visualizes the cell type compositions of the detected cellular neighborhoods (CN).

library(pheatmap)

for_plot <- prop.table(table(spe$cn_celltypes, spe$celltype), margin = 1)

pheatmap(for_plot, 
         color = colorRampPalette(c("dark blue", "white", "dark red"))(100), 
         scale = "column")

CN 10 and CN 1 are mainly composed of tumor cells with CN 10 forming the tumor/stroma border. CN 8 is mainly composed of B and BnT cells indicating TLS. CN 9 is composed of aggregated plasma cells and CN 3 contains most T cells.

6.4 Patch detection

The previous section focused on detecting cellular neighborhoods in a rather unsupervised fashion. However, the imcRtools package also provides methods for detecting spatial compartments in a supervised fashion. The patchDetection function allows the detection of connected sets of similar cells as proposed by (Hoch et al. 2022). In the following example, we will use the patchDetection function to detect function to detect tumor patches in three steps:

  1. Find connected sets of tumor cells (using the steinbock graph).
  2. Components which contain less than 10 cells are excluded.
  3. Expand the components by 1µm to construct a concave hull around the patch and include cells within the patch.
spe <- patchDetection(spe, 
                      patch_cells = spe$celltype == "Tumor",
                      img_id = "sample_id",
                      expand_by = 1,
                      min_patch_size = 10,
                      colPairName = "neighborhood")

plotSpatial(spe, 
            node_color_by = "patch_id", 
            img_id = "sample_id", 
            node_size_fix = 0.5) +
    theme(legend.position = "none") +
    scale_color_manual(values = rev(colors()))

6.5 Interaction analysis

The next section focuses on statistically testing the pairwise interaction between all cell types of the dataset. For this, the imcRtools package provides the testInteractions function which implements the interaction testing strategy proposed by (Schapiro et al. 2017).

Per grouping level (e.g., image), the testInteractions function computes the averaged cell type/cell type interaction count and computes this count against an empirical null distribution which is generated by permuting all cell labels (while maintaining the tissue structure).

In the following example, we use the steinbock generated spatial interaction graph and estimate the interaction or avoidance between cell types in the dataset.

out <- testInteractions(spe, 
                        group_by = "sample_id",
                        label = "celltype", 
                        colPairName = "neighborhood", 
                        iter = 200)

head(out)
## DataFrame with 6 rows and 10 columns
##       group_by from_label   to_label        ct       p_gt      p_lt interaction
##    <character>   <factor>   <factor> <numeric>  <numeric> <numeric>   <logical>
## 1 Patient1_001      Bcell Bcell              1 0.00497512  1.000000        TRUE
## 2 Patient1_001      Bcell BnTcell            0 1.00000000  1.000000       FALSE
## 3 Patient1_001      Bcell CD4                2 0.00497512  1.000000        TRUE
## 4 Patient1_001      Bcell CD8                0 1.00000000  0.835821       FALSE
## 5 Patient1_001      Bcell Myeloid            0 1.00000000  0.651741       FALSE
## 6 Patient1_001      Bcell Neutrophil        NA         NA        NA          NA
##            p       sig    sigval
##    <numeric> <logical> <numeric>
## 1 0.00497512      TRUE         1
## 2 1.00000000     FALSE         0
## 3 0.00497512      TRUE         1
## 4 0.83582090     FALSE         0
## 5 0.65174129     FALSE         0
## 6         NA        NA        NA

The returned DataFrame contains the test results per grouping level (in this case the image ID, group_by), “from” cell type (from_label) and “to” cell type (to_label). The sigval entry indicates if a pair of cell types is significantly interacting (sigval = 1), if a pair of cell types is significantly avoiding (sigval = -1) or if no significant interaction or avoidance was detected.

These results can be visualized by computing the sum of the sigval entries across all images:

library(scales)
out %>% as_tibble() %>%
    group_by(from_label, to_label) %>%
    summarize(sum_sigval = sum(sigval, na.rm = TRUE)) %>%
    ggplot() +
        geom_tile(aes(from_label, to_label, fill = sum_sigval)) +
        scale_fill_gradient2(low = muted("blue"), mid = "white", high = muted("red")) +
        theme(axis.text.x = element_text(angle = 45, hjust = 1))

In the plot above the red tiles indicate cell type pairs that were detected to significantly interact on a large number of images. On the other hand, blue tiles show cell type pairs which tend to avoid each other on a large number of images.

Here we can observe that tumor cells are mostly compartmentalized and are in avoidance with other cell types. As expected, B cells interact with BnT cells; regulatory T cells interact with CD4+ T cells and CD8+ T cells. Most cell types show self interactions indicating spatial clustering.

7 Further resources

The IMC data analysis book contains a detailed overview on the presented and other approaches for multiplexed image analysis and visualziation.

The steinbock framework provides functionality for image processing.

The ImcSegmentationPipeline provides a GUI-based version of the segmentation pipeline based on Ilastik pixel classification and image segmentation via CellProfiler.

The cytomapper package allows handling and visualization of multichannel images and segmentation masks directly in R.

The imcRtools package supports reading single-cell data from segmented images, multi-channel spillover correction, and spatial data analysis.

The imcdatasets R/Bioconductor package contains a number of publically available IMC datasets.

For a full overview on the presented approaches, please refer to the preprint:

Jonas Windhager, Bernd Bodenmiller, Nils Eling. And end-to-end workflow for multi-channel image processing and analysis. bioRxiv, 2021

8 Acknowledgments

Jonas Windhager developed the steinbock framework. Vito Zanotelli developed the IMC segmentation pipeline. Daniel Schulz, Tobias Hoch, Lasse Meyer, Jana Fischer and Vito Zanotelli provided code for the imcRtools package. Nicolas Damond and Tobias Hoch provided code for the cytomapper package.

Nils Eling is funded by Marie Sklodowska Curie Actions.

Session info

## R version 4.2.0 (2022-04-22)
## Platform: x86_64-apple-darwin17.0 (64-bit)
## Running under: macOS Catalina 10.15.7
## 
## Matrix products: default
## BLAS:   /Library/Frameworks/R.framework/Versions/4.2/Resources/lib/libRblas.0.dylib
## LAPACK: /Library/Frameworks/R.framework/Versions/4.2/Resources/lib/libRlapack.dylib
## 
## locale:
## [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
## 
## attached base packages:
## [1] stats4    stats     graphics  grDevices utils     datasets  methods  
## [8] base     
## 
## other attached packages:
##  [1] scales_1.2.0                pheatmap_1.0.12            
##  [3] viridis_0.6.2               viridisLite_0.4.0          
##  [5] forcats_0.5.1               stringr_1.4.0              
##  [7] dplyr_1.0.9                 purrr_0.3.4                
##  [9] readr_2.1.2                 tidyr_1.2.0                
## [11] tibble_3.1.7                ggplot2_3.3.6              
## [13] tidyverse_1.3.2             cytomapper_1.9.1           
## [15] EBImage_4.39.0              imcRtools_1.3.5            
## [17] SpatialExperiment_1.7.0     SingleCellExperiment_1.19.0
## [19] SummarizedExperiment_1.27.1 Biobase_2.57.1             
## [21] GenomicRanges_1.49.0        GenomeInfoDb_1.33.3        
## [23] IRanges_2.31.0              S4Vectors_0.35.1           
## [25] BiocGenerics_0.43.0         MatrixGenerics_1.9.1       
## [27] matrixStats_0.62.0          BiocStyle_2.25.0           
## 
## loaded via a namespace (and not attached):
##   [1] R.methodsS3_1.8.2           bit64_4.0.5                
##   [3] knitr_1.39                  irlba_2.3.5                
##   [5] multcomp_1.4-19             DelayedArray_0.23.0        
##   [7] R.utils_2.12.0              data.table_1.14.2          
##   [9] RCurl_1.98-1.7              doParallel_1.0.17          
##  [11] generics_0.1.3              flowCore_2.9.1             
##  [13] ScaledMatrix_1.5.0          terra_1.5-21               
##  [15] cowplot_1.1.1               TH.data_1.1-1              
##  [17] proxy_0.4-27                bit_4.0.4                  
##  [19] tzdb_0.3.0                  xml2_1.3.3                 
##  [21] lubridate_1.8.0             httpuv_1.6.5               
##  [23] assertthat_0.2.1            gargle_1.2.0               
##  [25] fontawesome_0.3.0           xfun_0.31                  
##  [27] hms_1.1.1                   jquerylib_0.1.4            
##  [29] evaluate_0.15               promises_1.2.0.1           
##  [31] fansi_1.0.3                 readxl_1.4.0               
##  [33] dbplyr_2.2.1                igraph_1.3.4               
##  [35] DBI_1.1.3                   CATALYST_1.21.2            
##  [37] htmlwidgets_1.5.4           googledrive_2.0.0          
##  [39] ellipsis_0.3.2              ggnewscale_0.4.7           
##  [41] ggpubr_0.4.0                backports_1.4.1            
##  [43] V8_4.2.0                    cytolib_2.9.1              
##  [45] bookdown_0.27               svgPanZoom_0.3.4           
##  [47] RcppParallel_5.1.5          sparseMatrixStats_1.9.0    
##  [49] vctrs_0.4.1                 abind_1.4-5                
##  [51] withr_2.5.0                 cachem_1.0.6               
##  [53] ggforce_0.3.3               vroom_1.5.7                
##  [55] svglite_2.1.0               cluster_2.1.3              
##  [57] crayon_1.5.1                drc_3.0-1                  
##  [59] labeling_0.4.2              edgeR_3.39.3               
##  [61] pkgconfig_2.0.3             units_0.8-0                
##  [63] tweenr_1.0.2                vipor_0.4.5                
##  [65] rlang_1.0.4                 lifecycle_1.0.1            
##  [67] sandwich_3.0-2              modelr_0.1.8               
##  [69] rsvd_1.0.5                  cellranger_1.1.0           
##  [71] polyclip_1.10-0             tiff_0.1-11                
##  [73] Matrix_1.4-1                raster_3.5-21              
##  [75] carData_3.0-5               Rhdf5lib_1.19.2            
##  [77] zoo_1.8-10                  reprex_2.0.1               
##  [79] beeswarm_0.4.0              RTriangle_1.6-0.10         
##  [81] googlesheets4_1.0.0         ggridges_0.5.3             
##  [83] GlobalOptions_0.1.2         png_0.1-7                  
##  [85] rjson_0.2.21                bitops_1.0-7               
##  [87] shinydashboard_0.7.2        R.oo_1.25.0                
##  [89] ConsensusClusterPlus_1.61.0 KernSmooth_2.23-20         
##  [91] rhdf5filters_1.9.0          DelayedMatrixStats_1.19.0  
##  [93] shape_1.4.6                 classInt_0.4-7             
##  [95] jpeg_0.1-9                  rstatix_0.7.0              
##  [97] ggsignif_0.6.3              beachmat_2.13.4            
##  [99] magrittr_2.0.3              plyr_1.8.7                 
## [101] zlibbioc_1.43.0             compiler_4.2.0             
## [103] dqrng_0.3.0                 concaveman_1.1.0           
## [105] RColorBrewer_1.1-3          plotrix_3.8-2              
## [107] clue_0.3-61                 cli_3.3.0                  
## [109] XVector_0.37.0              FlowSOM_2.5.2              
## [111] MASS_7.3-58                 tidyselect_1.1.2           
## [113] stringi_1.7.8               RProtoBufLib_2.9.0         
## [115] highr_0.9                   yaml_2.3.5                 
## [117] BiocSingular_1.13.0         locfit_1.5-9.6             
## [119] ggrepel_0.9.1               grid_4.2.0                 
## [121] sass_0.4.2                  tools_4.2.0                
## [123] parallel_4.2.0              circlize_0.4.15            
## [125] rstudioapi_0.13             foreach_1.5.2              
## [127] gridExtra_2.3               farver_2.1.1               
## [129] Rtsne_0.16                  ggraph_2.0.5               
## [131] DropletUtils_1.17.0         digest_0.6.29              
## [133] BiocManager_1.30.18         shiny_1.7.2                
## [135] Rcpp_1.0.9                  car_3.1-0                  
## [137] broom_1.0.0                 scuttle_1.7.2              
## [139] later_1.3.0                 httr_1.4.3                 
## [141] sf_1.0-7                    ComplexHeatmap_2.13.0      
## [143] colorspace_2.0-3            rvest_1.0.2                
## [145] fs_1.5.2                    XML_3.99-0.10              
## [147] splines_4.2.0               scater_1.25.2              
## [149] graphlayouts_0.8.0          sp_1.5-0                   
## [151] systemfonts_1.0.4           xtable_1.8-4               
## [153] jsonlite_1.8.0              tidygraph_1.2.1            
## [155] R6_2.5.1                    pillar_1.8.0               
## [157] htmltools_0.5.3             mime_0.12                  
## [159] nnls_1.4                    glue_1.6.2                 
## [161] fastmap_1.1.0               DT_0.23                    
## [163] BiocParallel_1.31.10        BiocNeighbors_1.15.1       
## [165] fftwtools_0.9-11            class_7.3-20               
## [167] codetools_0.2-18            mvtnorm_1.1-3              
## [169] utf8_1.2.2                  lattice_0.20-45            
## [171] bslib_0.4.0                 curl_4.3.2                 
## [173] ggbeeswarm_0.6.0            colorRamps_2.3.1           
## [175] gtools_3.9.3                magick_2.7.3               
## [177] survival_3.3-1              limma_3.53.4               
## [179] rmarkdown_2.14              munsell_0.5.0              
## [181] e1071_1.7-11                GetoptLong_1.0.5           
## [183] rhdf5_2.41.1                GenomeInfoDbData_1.2.8     
## [185] iterators_1.0.14            HDF5Array_1.25.1           
## [187] haven_2.5.0                 reshape2_1.4.4             
## [189] gtable_0.3.0
Angelo, Michael, Sean C. Bendall, Rachel Finck, Matthew B. Hale, Chuck Hitzman, Alexander D. Borowsky, Richard M. Levenson, et al. 2014. “Multiplexed Ion Beam Imaging of Human Breast Tumors.” Nature Medicine 20 (4): 436–42.
Eling, Nils, Nicolas Damond, Tobias Hoch, and Bernd Bodenmiller. 2020. “Cytomapper: An r/Bioconductor Package for Visualization of Highly Multiplexed Imaging Data.” Bioinformatics 36 (24): 5706--5708. https://doi.org/10.1093/bioinformatics/btaa1061.
Giesen, Charlotte, Hao A. O. Wang, Denis Schapiro, Nevena Zivanovic, Andrea Jacobs, Bodo Hattendorf, Peter J. Schüffler, et al. 2014. “Highly Multiplexed Imaging of Tumor Tissues with Subcellular Resolution by Mass Cytometry.” Nature Methods 11 (4): 417–22.
Goltsev, Yury, Nikolay Samusik, Julia Kennedy-Darling, Salil Bhate, Matthew Hale, Gustavo Vazquez, Sarah Black, and Garry P. Nolan. 2018. “Deep Profiling of Mouse Splenic Architecture with CODEX Multiplexed Imaging.” Cell 174: 968–81.
Gut, Gabriele, Markus D Herrmann, and Lucas Pelkmans. 2018. “Multiplexed Protein Maps Link Subcellular Organization to Cellular States.” Science 361: 1–13.
Hoch, Tobias, Daniel Schulz, Nils Eling, Julia Martínez Gómez, Mitchell P. Levesque, and Bernd Bodenmiller. 2022. “Multiplexed Imaging Mass Cytometry of the Chemokine Milieus in Melanoma Characterizes Features of the Response to Immunotherapy.” Science Immunology 7 (70): eabk1692. https://doi.org/10.1126/sciimmunol.abk1692.
Lin, Jia-Ren, Benjamin Izar, Shu Wang, Clarence Yapp, Shaolin Mei, Parin M. Shah, Sandro Santagata, and Peter K. Sorger. 2018. “Highly Multiplexed Immunofluorescence Imaging of Human Tissues and Tumors Using t-CyCIF and Conventional Optical Microscopes.” eLife 7: 1–46.
Saka, Sinem K., Yu Wang, Jocelyn Y. Kishi, Allen Zhu, Yitian Zeng, Wenxin Xie, Koray Kirli, et al. 2019. “Immuno-SABER Enables Highly Multiplexed and Amplified Protein Imaging in Tissues.” Nature Biotechnology 37: 1080–90.
Schapiro, Denis, Hartland W Jackson, Swetha Raghuraman, Jana R Fischer, Vito RT Zanotelli, Daniel Schulz, Charlotte Giesen, Raúl Catena, Zsuzsanna Varga, and Bernd Bodenmiller. 2017. “histoCAT: Analysis of Cell Phenotypes and Interactions in Multiplex Image Cytometry Data.” Nature Methods 14: 873--876.
Schulz, Daniel, Vito RT Zanotelli, Rana R Fischer, Denis Schapiro, Stefanie Engler, Xiao-Kang Lun, Hartland W Jackson, and Bernd Bodenmiller. 2018. “Simultaneous Multiplexed Imaging of mRNA and Proteins with Subcellular Resolution in Breast Cancer Tissue Samples by Mass Cytometry.” Cell Systems 6: 25–36.e5.
Schürch, Christian M, Salil S Bhate, Graham L Barlow, Darci J Phillips, Luca Noti, Inti Zlobec, Pauline Chu, et al. 2020. “Coordinated Cellular Neighborhoods Orchestrate Antitumoral Immunity at the Colorectal Cancer Invasive Front.” Cell 182: 1341–59.
Windhager, Jonas, Bernd Bodenmiller, and Nils Eling. 2021. “An End-to-End Workflow for Multiplexed Image Processing and Analysis.” bioRxiv.